Scalable and Fault Tolerant Platform for Distributed Learning on Private Medical Data
نویسندگان
چکیده
Medical image data is naturally distributed among clinical institutions. This partitioning, combined with security and privacy restrictions on medical data, imposes limitations on machine learning algorithms in clinical applications, especially for small and newly established institutions. We present InsuLearn: an intuitive and robust open-source† platform designed to facilitate distributed learning (classification and regression) on medical image data, while preserving data security and privacy. InsuLearn is built on ensemble learning, in which statistical models are developed at each institution independently and combined at secure coordinator nodes. InsuLearn protocols are designed such that the liveness of the system is guaranteed as institutions join and leave the network. Coordination is implemented as a cluster of replicated state machines, making it tolerant to individual node failures. We demonstrate that InsuLearn successfully integrates accurate models for horizontally partitioned data while preserving privacy.
منابع مشابه
SCOPE: Scalable Composite Optimization for Learning on Spark
Many machine learning models, such as logistic regression (LR) and support vector machine (SVM), can be formulated as composite optimization problems. Recently, many distributed stochastic optimization (DSO) methods have been proposed to solve the large-scale composite optimization problems, which have shown better performance than traditional batch methods. However, most of these DSO methods m...
متن کاملCumuloNimbo: A Cloud Scalable Multi-tier SQL Database
This article presents an overview of the CumuloNimbo platform. CumuloNimbo is a framework for multi-tier applications that provides scalable and fault-tolerant processing of OLTP workloads. The main novelty of CumuloNimbo is that it provides a standard SQL interface and full transactional support without resorting to sharding and no need to know the workload in advance. Scalability is achieved ...
متن کاملTowards High-performance and Fault-tolerant Distributed Java Implementations
Java Virtual Machines form an important part of the web and business server market. Distributed Java Virtual Machines have the potential to make a significant contribution to industries that utilize this technology. An attractive platform for this purpose is the cluster, a highly cost-effective and scalable parallel computer model. However, realizing on such a platform a high performance virtua...
متن کاملFault-tolerant control for Scalable Distributed Data Structures
Scalable Distributed Data Structures (SDDS) can be applied for multicomputers. Multicomputers were developed as a response to market demand for scalable and dependable but not expensive systems. SDDS consists of two components dynamically spread across a multicomputer: records belonging to a file and a mechanism controlling record placement in the file. Methods of making records of the file mor...
متن کاملA Distributed Recommendation Platform for Big Data
The vast amount of information that recommenders manage these days has reached a point where scalability has become a critical factor. In this work, we propose a scalable architecture designed for computing Collaborative Filtering recommendations in a Big Data scenario. In order to build a highly scalable and fault-tolerant platform, we employ fully distributed systems without any single point ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017